An emerging challenge in the online classification of social media datastreams is to keep the categories used for classification up-to-date. In thispaper, we propose an innovative framework based on an Expert-Machine-Crowd(EMC) triad to help categorize items by continuously identifying novel conceptsin heterogeneous data streams often riddled with outliers. We unify constrainedclustering and outlier detection by formulating a novel optimization problem:COD-Means. We design an algorithm to solve the COD-Means problem and show thatCOD-Means will not only help detect novel categories but also seamlesslydiscover human annotation errors and improve the overall quality of thecategorization process. Experiments on diverse real data sets demonstrate thatour approach is both effective and efficient.
展开▼